home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 1
/
Cream of the Crop 1.iso
/
EDITOR
/
KDP32_1.ARJ
/
DICMERGE.DOC
< prev
next >
Wrap
Text File
|
1992-04-24
|
9KB
|
178 lines
DICMERGE.EXE (VERSION 2)
DICTIONARY MAINTENANCE FOR JWRITE
1. Introduction.
JWRITE has no built-in functions for the maintenance of the dictionary. This
is nevertheless a desirable feature to have. If the standard dictionary does
not contain kanji equivalents of words which you frequently need in your field
of business, it would be nice to be able to add them to the dictionary.
This IS possible, using the enclosed program DICMERGE.EXE, but I must warn you
that this is a rather complicated operation. Please follow these instructions
carefully. It is advisable to copy the dictionary (WNNSJIS.DIC) and its index
file (WNNSJIS.IND) to somewhere for safekeeping.
2. Dictionary files
A dictionary file like WNNSJIS.DIC consists of lines of SJIS coded text. Every
line consists of the following elements, from left to right:
a- the KEYWORD, which must be in ascii (hankaku romaji) or hiragana. Katakana
keywords are not allowed. From version 2, the keyword is allowed
to be a "key phrase" which may include spaces. The lines may in general
not begin with a space; this is considered a "sort violation" and causes
the program to abort (see below).
b- one SPACE (don't forget this!)
c- one RIGHT SLASH (/).
d- one or more possible TRANSLATIONS for the keyword. The translations may
be written in any character type, katakana, hiragana, kanji, big or
small ascii, or special characters.
Every translation, including the last one, must be followed by a right
slash. Any text after the last slash will be ignored.
e- a LINE FEED CHARACTER for signalling the end of the line. It would be
possible to have carriage return - line feed combinations at the end of
each line, but in a dictionary containing tens of thousands of lines, that
would just be tens of thousands of extra bytes.
An example of a dictionary line is:
é½é▒éñ /è±ì`/ï@ì\/ïCî≤/
In other words, the dictionary is just a text file which can be read and
edited (in principle) by JWRITE itself. In theory, JWRITE could be directly
used for maintenance work on its own dictionary. Unfortunately, the size of the
dictionary file makes this impossible in practice. The dictionary file cannot
possibly fit in memory. (You can try loading it with JWRITE WNNSJIS.DIC, to see
the beginning of the dictionary, but please do not actually try to change
anything. You can also view, but not edit, the entire dictionary if you use
Vernon Buergs LIST.COM, version 7.5i. Use the /B switch to let LIST run under
KDPLUS. If you use the KDPLUS keyboard input utility KJIN, you can even
look up words in the dictionary using LIST.)
However, you can add new information to the dictionary by making a small
dictionary for yourself, containing the information that you want to add, and
merging it with the existing dictionary. This can be done with the program
DICMERGE.
Notice that it is only possible to ADD to the dictionary this way. You cannot
remove anything from it (not with this utility, anyway. Utilities for removing
dictionary lines can, however, be made. They should overwrite the un-needed
lines with spaces; a DICMERGE operation on the file will then re-create a
valid index).
3. Making an update dictionary
You make your update dictionary as a text file, using the rules a-d specified
above (rule e is not important for the update file, because DICMERGE will
convert CR/LF combinations to single LF's). You can add completely new
keywords with their translations, and also new translations for existing
keywords. For instance, the present version of the WNNSJIS.DIC has only one
translation for the keyword é┐éπéñéó, namely Æìê╙ . Now imagine that you often
need military terms, and you want to have the word Æåê╤ (also pronounced
é┐éπéñéó) in the dictionary as well. Your update text (call it, for instance,
PRIVATE.DIC) must then contain the line
é┐éπéñéó /Æåê╤/
(Because Æåê╤ is not in the dictionary yet, you must construct the word from
the separate kanji Æå and ê╤, which are in the dictionary as single kanji,
to be found through their pronunciations é┐éπéñ and éó).
Your update text may contain many such lines. It is IMPORTANT (in fact this is
the most important and the most difficult bit of the whole operation) that the
file be SORTED:
-the lines with alphabetical keyword must come before the lines
with hiragana keyword
-the lines with alphabetical keyword must be in alphabetical order (in fact,
in standard ASCII order. Look at any ASCII table).
-the lines with hiragana keyword must be in Japanese kana order
(éá-éó-éñ-éª-é¿-é⌐-é½-é¡-é»....etc.)
When you have finished, save the file. If you're not sure about the sorting,
use the DOS SORT utility.
4. Merging with the existing dictionary
Now go to DOS and type
DICMERGE
Type in the names of the 2 dictionaries: file 1 is PRIVATE.DIC, file 2 is the
existing dictionary, WNNSJIS.DIC. The new (merged) dictionary will be called
MERGE.DIC; at the same time an index file will be made for it, MERGE.IND.
MERGE.DIC will separate lines by line feeds only, no carriage returns.
If you type in only one dictionary name, and just press ENTER when asked
for the other one, DICMERGE will still run; its only function will then
be to re-create the index for the one dictionary that you specified. (You
might need this if the index file has become lost or corrupted).
The merge process will take some time (a minute or so, for a large dictionary).
You can follow its progress on your screen, as new entries are constructed for
the index (this works best when you are in the KDPLUS environment).
Lines which are illegally-formed (e.g. lines with only one slash in them, or
lines which begin with a katakana or a kanji) are discarded. The program will
warn you, but continue with the merge process.
However, lines which are otherwise legally-formed but are not in proper sorted
order will cause the program to abort, displaying the location of the "sort
violation". You can then try to correct the situation before proceeding to the
next step. If the program halts for that reason, there will be no MERGE.DIC
and MERGE.IND files generated (for your protection).
When the merge process is finished, you must enter the following commands:
del wnnsjis.* (I hope you saved the old version somewhere..)
ren merge.* wnnsjis.*
From that moment, the new keywords and new meanings have been added to the
dictionary, and are accessible by means of the ALT-L function of JWRITE. If
there were "merged" entries (the same keyword occurring in both input
dictionaries but with different translations) the translations from the
first input dictionary will be listed first on the corresponding line of the
output dictionary.
5. A tip.
Here and there in public domain sources you can find dictionary files. If
you are sure that they conform to the rules mentioned in section 2, you can
merge them with your existing dictionary to increase its capabilities. There
will be a penalty: the bigger the dictionary, the slower it will be in general.
Test it with a "slow word", like ÉVò╖ (the position of this word in the list
is such that looking for it will take some time, more than a second).
6. New features in version 2.
It is now possible to do a DICMERGE on only one dictionary (just press ENTER
when asked for the other one). This will re-create the index.
The program performs more stringent checking on the dictionary lines, reducing
the chances of destroying your dictionary by merging it with a file containing
illegal lines.
It is now possible (in principle) to delete lines from the dictionary by
overwriting them with spaces.
"Key phrases" (with spaces in them) are now allowed. However, the lookup
mechanism of JWRITE will not recognize them unless your version of JWRITE
is 1.5 or higher.
Katakana keywords are now detected and discarded. In the previous version
it seemed that you could get away with including katakana keywords (if you
put them at the end of the update dictionary), but in fact each katakana entry
would make some alphabetic entries inaccessible by destroying the index
pointers to them.
7. Note.
A file NUMBERS.DIC, with which you can extend the dictionary with novel
number symbols like çD and ç[, has been provided in this archive for
test purposes.
Tokyo, 5 January 1992 (Revised 3 March 1992)
Jan W. Stumpel